Accurate Genomic Prediction Of Human Height

نویسندگان

  • Louis Lello
  • Steven G. Avery
  • Laurent Tellier
  • Ana Vazquez
  • Gustavo de los Campos
  • Stephen D. H. Hsu
چکیده

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models

We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trai...

متن کامل

A Comparison of the Sensitivity of the BayesC and Genomic Best Linear Unbiased Prediction(GBLUP) Methods of Estimating Genomic Breeding Values under Different Quantitative Trait Locus(QTL) Model Assumptions

The objective of this study was to compare the accuracy of estimating and predicting breeding values using two diverse approaches, GBLUP and BayesC, using simulated data under different quantitative trait locus(QTL) effect distributions. Data were simulated with three different distributions for the QTL effect which were uniform, normal and gamma (1.66, 0.4). The number of QTL was assumed to be...

متن کامل

An Upper Bound for Accuracy of Prediction Using GBLUP

This study aims at characterizing the asymptotic behavior of genomic prediction R2 as the size of the reference population increases for common or rare QTL alleles through simulations. Haplotypes derived from whole-genome sequence of 85 Caucasian individuals from the 1,000 Genomes Project were used to simulate random mating in a population of 10,000 individuals for at least 100 generations to c...

متن کامل

GENOMIC SELECTION Accuracy of Genomic Prediction in Switchgrass (Panicum virgatum L.) Improved by Accounting for Linkage Disequilibrium

Switchgrass is a relatively high-yielding and environmentally sustainable biomass crop, but further genetic gains in biomass yield must be achieved to make it an economically viable bioenergy feedstock. Genomic selection (GS) is an attractive technology to generate rapid genetic gains in switchgrass, and meet the goals of a substantial displacement of petroleum use with biofuels in the near fut...

متن کامل

Application of Artificial Neural Network and Fuzzy Inference System in Prediction of Breaking Wave Characteristics

Wave height as well as water depth at the breaking point are two basic parameters which are necessary for studying coastal processes. In this study, the application of soft computing-based methods such as artificial neural network (ANN), fuzzy inference system (FIS), adaptive neuro fuzzy inference system (ANFIS) and semi-empirical models for prediction of these parameters are investigated. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1709.06489  شماره 

صفحات  -

تاریخ انتشار 2017